Pesquisa | Portal Regional da BVS

OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization.

Ahdritz, Gustaf; Bouatta, Nazim; Floristean, Christina; Kadyan, Sachin; Xia, Qinghui; Gerecke, William; O'Donnell, Timothy J; Berenberg, Daniel; Fisk, Ian; Zanichelli, Niccolò; Zhang, Bo; Nowaczynski, Arkadiusz; Wang, Bei; Stepniewska-Dziubinska, Marta M; Zhang, Shang; Ojewole, Adegoke; Guney, Murat Efe; Biderman, Stella; Watkins, Andrew M; Ra, Stephen; Lorenzo, Pablo Ribalta; Nivon, Lucas; Weitzner, Brian; Ban, Yih-En Andrew; Chen, Shiyang; Zhang, Minjia; Li, Conglong; Song, Shuaiwen Leon; He, Yuxiong; Sorger, Peter K; Mostaque, Emad; Zhang, Zhao; Bonneau, Richard; AlQuraishi, Mohammed.

Nat Methods ; 2024 May 14.

Artigo em Inglês | MEDLINE | ID: mdl-38744917

RESUMO

AlphaFold2 revolutionized structural biology with the ability to predict protein structures with exceptionally high accuracy. Its implementation, however, lacks the code and data required to train new models. These are necessary to (1) tackle new tasks, like protein-ligand complex structure prediction, (2) investigate the process by which the model learns and (3) assess the model's capacity to generalize to unseen regions of fold space. Here we report OpenFold, a fast, memory efficient and trainable implementation of AlphaFold2. We train OpenFold from scratch, matching the accuracy of AlphaFold2. Having established parity, we find that OpenFold is remarkably robust at generalizing even when the size and diversity of its training set is deliberately limited, including near-complete elisions of classes of secondary structure elements. By analyzing intermediate structures produced during training, we also gain insights into the hierarchical manner in which OpenFold learns to fold. In sum, our studies demonstrate the power and utility of OpenFold, which we believe will prove to be a crucial resource for the protein modeling community.

Masked inverse folding with sequence transfer for protein representation learning.

Yang, Kevin K; Zanichelli, Niccolò; Yeh, Hugh.

Protein Eng Des Sel ; 362023 Jan 21.

Artigo em Inglês | MEDLINE | ID: mdl-37883472

RESUMO

Self-supervised pretraining on protein sequences has led to state-of-the art performance on protein function and fitness prediction. However, sequence-only methods ignore the rich information contained in experimental and predicted protein structures. Meanwhile, inverse folding methods reconstruct a protein's amino-acid sequence given its structure, but do not take advantage of sequences that do not have known structures. In this study, we train a masked inverse folding protein masked language model parameterized as a structured graph neural network. During pretraining, this model learns to reconstruct corrupted sequences conditioned on the backbone structure. We then show that using the outputs from a pretrained sequence-only protein masked language model as input to the inverse folding model further improves pretraining perplexity. We evaluate both of these models on downstream protein engineering tasks and analyze the effect of using information from experimental or predicted structures on performance.

Assuntos

Engenharia de Proteínas , Dobramento de Proteína , Sequência de Aminoácidos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA